Error-tracking clustering gives quantitative statistics to DNA segmentation analysis
نویسندگان
چکیده
Inferences acquired by applying clustering analysis of microarrays cannot be reliably assessed before data-originated errors are quantified, an exacting task that is often not performed. Here, we present a novel and fast clustering technique, pair-wise Gaussian merging (PGM), suited for this purpose. Designed for systems with normally distributed error, PGM treats each observation as a Gaussian distribution function, with error as width, uses a simple but exact mathematical relation to track error at every step of clustering, and gives results from which quantitative statistics are easily extracted. PGM is built on a framework of agglomerative hierarchical clustering, uses t-value as distance and requires no linkage criteria. We demonstrate the merits of PGM by applying it to a segmentation algorithm for DNA copy number analysis (SAD) which, by comparing it’s performance to existing algorithms, we verify that it: provides quantitative statistics for predictions; is simpler in formulation; is less thirsty for memory; offers higher accuracy; and for today’s typical array size, is faster by orders of magnitude than its nearest competitor. With only two user-adjusted and easily comprehended parameters, SAD is highly user friendly. SAD’s running time scales linearly with data size and is therefore ideally suited to the challenge of ever-growing array resolution. On a typical modern notebook, SAD completes high-quality copy number analysis for a 250 thousand-marker array in ∼1 second and a 1.8 million-marker array in ∼8 seconds.
منابع مشابه
Automatic Prostate Cancer Segmentation Using Kinetic Analysis in Dynamic Contrast-Enhanced MRI
Background: Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) provides functional information on the microcirculation in tissues by analyzing the enhancement kinetics which can be used as biomarkers for prostate lesions detection and characterization.Objective: The purpose of this study is to investigate spatiotemporal patterns of tumors by extracting semi-quantitative as well as w...
متن کاملExperiments on speaker tracking and segmentation in radio broadcast news
In this paper we describe the speaker tracking and clustering system that we implemented for the ESTER evaluation campaign. We present some experiments on normalization in speaker tracking, in particular concerning the use of t-norm for speaker tracking in broadcast news. Results show that the use of t-norm significantly improves the performance at low false alarm rates. In a second part of the...
متن کاملPerformance Analysis of Entropy based methods and Clustering methods for Brain Tumor Segmentation
Brain tumor is the most deadly disease that affects human life span. To segment the brain tumor part, many segmentation techniques have been emerged in image processing like region based Segmentation, Boundary based segmentation. In this paper, several entropies based methods and several cluster techniques are compared and analyzed for brain tumor segmentation. Several entropies such as rough e...
متن کاملCluster-Based Image Segmentation Using Fuzzy Markov Random Field
Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...
متن کاملAn Improved Clustering-Based Approach for DNA Microarray Image Segmentation
DNA Microarrays are powerful techniques that are used to analyze the expression of DNA in organisms after performing experiments. One of the key issues in the experimental approaches that utilize microarrays is to extract quantitative information from the spots, which represent the genes in the experiments. In this process, separating the background from the foreground is a fundamental problem ...
متن کامل